Circulating miR-206 and miR-1246 as Markers in the Early Diagnosis of Lung Cancer in Patients with Chronic Obstructive Pulmonary Disease

Lung cancer (LC) is the most common cause of cancer death, with 75% of cases being diagnosed in late stages. This study aimed to determine potential miRNAs as biomarkers for the early detection of LC in chronic obstructive pulmonary disease (COPD) cases. Ninety-nine patients were included, with registered clinical and lung function parameters followed for 6 years. miRNAs were determined in 16 serum samples from COPD patients (four with LC and four controls) by next generation sequencing (NGS) at LC diagnosis and 3 years before. The validation by qPCR was performed in 33 COPD-LC patients and 66 controls at the two time points. Over 170 miRNAs (≥10 TPM) were identified; among these, miR-224-5p, miR-206, miR-194-5p, and miR-1246 were significantly dysregulated (p < 0.001) in COPD-LC 3 years before LC diagnosis when compared to the controls. The validation showed that miR-1246 and miR-206 were differentially expressed in COPD patients who developed LC three years before (p = 0.035 and p = 0.028, respectively). The in silico enrichment analysis showed miR-1246 and miR-206 to be linked to gene mediators in various signaling pathways related to cancer. Our study demonstrated that miR-1246 and miR-206 have potential value as non-invasive biomarkers of early LC detection in COPD patients who could benefit from screening programs.


Introduction
Lung cancer is the second most prevalent cancer and the most common cause of cancer death [1], with more than 2.2 million new cases of lung cancer in 2020 and 1.80 million deaths globally.

miRNA Screening Study
A total of eight individuals (COPD patients with LC, and COPD patients who did not develop LC during follow-up as controls) were included in the screening study with biological samples in two timelines of analysis (at the time of LC diagnosis and 3 years before). The cases and controls were male, age-matched (mean age of 64 years) and presented similar smoking habits. The main clinical characteristics of these individuals are shown in Tables S1 and S2. Most of the COPD patients (90%) had moderate airway obstruction.
Smokers with COPD and LC had higher FEV 1 and lower K CO values than COPD controls, but a similar proportion of emphysema was visualized by CT scan. LC patients' histological subtypes were non-small-cell lung cancer (NSCLC) (75% adenocarcinoma, 25% squamous carcinoma).
controls, but a similar proportion of emphysema was visualized by CT scan. LC patients' histological subtypes were non-small-cell lung cancer (NSCLC) (75% adenocarcinoma, 25% squamous carcinoma).

Figure 1.
Top genes with the highest variance across samples were selected for hierarchical clustering. Each row represents one gene, and each column represents one sample. The color represents the difference of the count value to the row mean (−2 to 4). Each row represents one gene, and each column represents one sample. The color represents the difference of the count value to the row mean (−2 to 4).
controls, but a similar proportion of emphysema was visualized by CT scan. LC patients' histological subtypes were non-small-cell lung cancer (NSCLC) (75% adenocarcinoma, 25% squamous carcinoma).

Figure 1.
Top genes with the highest variance across samples were selected for hierarchical clustering. Each row represents one gene, and each column represents one sample. The color represents the difference of the count value to the row mean (−2 to 4). Each gene's fold change (FC) is plotted against its mean expression among all samples three years before LC diagnosis. All significantly differentially expressed genes are marked in red. Significant changes are defined as p-value < 0.001, FDR < 0.01, and Log2FC > 2.

Validation Study
The validation of the miRNA expression analysis was performed on all 99 participants included in the study for the two timeline points: at the time of LC diagnosis (33 LC cases, 66 controls), and three years before (21 LC cases and 30 controls). The main clinical characteristics of cases and controls are shown in Tables 2 and 3. The patients were primarily men (85%), with a mean age of 63 years, and heavy smokers. The patients with COPD and LC had higher FEV 1 and lower K CO values than the COPD controls, but a similar proportion of emphysema. The tumor histological subtypes present in the LC patients were mostly non-small-cell lung cancer (NSCLC) (61% adenocarcinoma, 27% squamous carcinoma, 3% microcytic carcinoma, and two cases of undifferentiated carcinoma). ----* Data are presented as mean ± SD. ** Data are presented as median (25th-75thpc). ‡ Number of packs of cigarettes smoked per day × number of years of smoking. BMI: body mass index; FEV 1 : forced expiratory volume in one second; FVC: forced vital capacity; % pred: percent predicted; PaO 2 : partial oxygen tension; K CO : transfer factor coefficient of the lung for carbon monoxide, which is DL CO /VA; IC/TLC: inspiratory capacity to total lung capacity ratio; 6MWD: six-minute walking distance test. BODE index: body mass index, airflow obstruction, dyspnea, and exercise capacity. † Emphysema diagnosed by CT scan. § This measure was available for 11 individuals with COPD and LC. p-value < 0.05 (in bold).
* Data are presented as mean ± SD. ** Data are presented as median (25th-75thpc). ‡ Number of packs of cigarettes smoked per day × number of years smoking. BMI: body mass index; FEV 1 : forced expiratory volume in one second; FVC: forced vital capacity; % pred: percent predicted; PaO 2 : partial oxygen tension; K CO : transfer factor coefficient of the lung for carbon monoxide, which is DL CO /VA; IC/TLC: inspiratory capacity to total lung capacity ratio; 6MWD: six-minute walking distance test; BODE index: body mass index, airflow obstruction, dyspnea, and exercise capacity; § 11 COPD individuals with LC were analyzed for these variables.
In the validation experiments, miR-224 was expressed in less than 85% of the studied population, so it was excluded for further analysis. miR-1246 was overexpressed in patients with COPD who developed LC 3 years before LC diagnosis when compared to COPD controls who did not develop LC (log2FC = 2.63, p = 0.032) ( Figure 3). A significant difference in miR-1246 expression was also found at the time of LC diagnosis (log2FC = −4.79, p = 0.022) ( Figure S1). miR-206 was also found dysregulated between cases and controls 3 years before (log2FC = −2.205, p = 0.027) (Figure 3), and maintained at the time of LC diagnosis (log2FC = −1.624, p = 0.023) ( Figure S1).
The expression of miR-194-5p did not differ significantly between the study groups in either of the two timelines.  Figure S1).

Clinical Relations
The levels of expression of miR-26a-5p and miR-194-5p in COPD patients who would go on to develop LC within the next 3 years correlated significantly with the levels of arterial oxygen PaO2 (r = 0.74, p = 0.035, and r = 0.97, p < 0.001, respectively), although this measure was only available in a small sample of individuals ( Figure S2). No relationship was found between any other miRNA expression in COPD individuals with LC and other clinical variables.

Clinical Relations
The levels of expression of miR-26a-5p and miR-194-5p in COPD patients who would go on to develop LC within the next 3 years correlated significantly with the levels of arterial oxygen PaO 2 (r = 0.74, p = 0.035, and r = 0.97, p < 0.001, respectively), although this measure was only available in a small sample of individuals ( Figure S2). No relationship was found between any other miRNA expression in COPD individuals with LC and other clinical variables.

Functional Annotation Analysis and Gene Target Prediction
Our functional analysis, through the KEGG pathways, revealed miR-1246 to be significantly associated with target genes enriched in several functions and signaling pathways. The most significant ones are reported in Table 4A: the viral carcinogenesis, the apoptosis pathway, the TP53 signaling pathway, and the central carbon metabolism in cancer. On the other hand, miR-206 was shown to be significantly associated with the enriched target genes detailed in Table 4B: glycosphingolipid biosynthesis, miRNAs in cancer, proteoglycans in cancer, and transcriptional misregulation in cancer.  The GO analysis of the collection of target genes of miR-1246 together with miR-206 was enriched in several functions, including the cellular nitrogen metabolic process, the biosynthetic process, the protein modification process, gene expression, the cellular component or protein assembly, membrane organization, and the mitotic cell cycle (Table S3).

Discussion
The presence of chronic obstructive pulmonary disease (COPD) is an important risk factor for developing lung cancer. Considerable effort is being made to identify biomarkers that, in combination with conventional tests, can help identify individuals more likely to develop LC and candidates for close monitoring and early diagnosis of LC. This study is the first to describe the altered expression of two circulating miRNAs, hsa-miR1246 and hsa-miRNA-206, that can be detected in the serum samples of patients with COPD up to three years before lung cancer diagnosis and could help identify subjects at high risk for LC development who are likely to benefit from screening programs.

miR-1246 and Predictive Risk of Lung Cancer
In our study, miR-1246 was found overexpressed in individuals with COPD and LC three years before diagnosis, and this differential expression was maintained at the time of diagnosis when compared to COPD controls who did not develop LC. Moreover, we could detect increased levels of this miRNA in patients who developed LC in relation to the presence of emphysema, a well-known risk factor of LC [6]. Our findings are in accordance with Yang et al., who, in a small group of patients, identified miR-1246 to be the most upregulated miRNA in the serum of patients with LC when compared to healthy controls [20]. However, this study was only conducted at the time of diagnosis, thus not informing its value as a toll for early screening for LC. Interestingly, these authors also reported that this miRNA increased the migration and invasiveness of A549 lung cancer cells. Specifically, they observed that E-cadherin expression decreased and vimentin and transforming growth factor-β expression increased, indicating the possible role of miR-1246 in the Wnt/-βcatenin pathway and its relation to cancer progression and metastasis. Similarly, in the study by Kim et al., miR-1246 was found to be overexpressed in sphere-forming cells, and an anti-miR-1246 strategy was effectively shown to suppress the proliferation, sphere-formation, colony-forming ability, and invasiveness of cancerous cells [21]. In addition, other authors have documented a role for miR-1246 in the progression of non-small-cell lung cancer [22].
Our study's functional in silico analysis suggests that miR-1246 may be linked to gene mediators in various signaling pathways related to cancer. When exploring the KEGG pathway analysis in relation to the miR-1246 target genes, it was enriched, for example, in the P53 signaling pathway. Also, miR-1246 has been identified as a novel p53 target miRNA, as shown by the study of Liao et al., who reported p53 to inhibit DYRK1A expression through the induction of miR-1246 in non-small-cell lung cancer cells [23]. Several studies indicate that P53, a typical tumor suppressor gene, is involved in NSCLC development and progression [24]. BCL2L2 is a predicted target of miR-1246. It this context, increases in Bcl-2 protein expression have been shown to contribute to the development of a wide variety of human cancers, including lung cancer, and may act as a resistance factor against several anticancer agents [25]. Another predicted target of miR-1246 is HER4 (ERBB4,) a tyrosine kinase receptor. EGFR is a plasma membrane glycoprotein that belongs to this family of receptors. EGFR dimerization may result in cancer cell proliferation, the inhibition of apoptosis, invasion, metastasis, and tumor-induced neovascularization [26].

miRNA-206 and Predictive Risk of Lung Cancer
Another important finding was that miR-206 dysregulation can be detected early in patients with COPD who will develop lung cancer three years before diagnosis. The low expression of miR-206 is related to lung cancer invasion and metastasis [27,28]. miR-1 expression, which is related to miR-206, was found to be decreased in lung cancers [29]. It was reported that the treatment with HDAC as an inhibitor of lung cancer cells could induce the expression of repressed miR-1 by the downregulation of oncogenes such as MET, PIM1, FOXP1, and HDAC4 [29]. MET is a common target in three of the pathways detected for miR-206 in the in silico analysis of the present study: miRNAs in cancer, proteoglycans in cancer, and transcriptional misregulation in cancer.
Interestingly, Yang et al. found that forced miR-206 expression restored gefitinib sensitivity in IL6-induced gefitinib-resistant EGFR-mutant lung cancer cells by inhibiting the IL6/JAK1/STAT3 pathway [30]. Furthermore, studies performed on NSCLC in A549 cells showed that the miR-206 antagomir therapy decreased tumor volume and the formation of intra-tumoral capillary tubes, and increased apoptosis by blocking the 14-3-3zeta/STAT3/HIF-1alpha/VEGF signaling pathway [31]. miR-206 has been reported to function as a tumor metastasis suppressor in different type of cancers, including colon and gastric cancers [32,33].

miRNAs and Their Relation to Pulmonary Function
We found a strong correlation between miR-26a-5p and miR-194-5p expression and partial oxygen arterial tension (PaO 2 ) in COPD patients who went on to develop LC within the next 3 years. Other studies have reported increased levels of miR-26a in COPD patients with chronic hypoxia [34] and in relation to the apnea process [35]. These findings highlight the importance of the hypoxemic condition in stimulating the expression of certain miRNAs. It is well known that within the mechanisms relating epigenetics and oxidative stress is the transcription factor hypoxia-inducible factor 1 α (HIF-1α) [36]. Hypoxia, or a hypoxic microenvironment, triggers the angiogenesis process, and the cellular response to hypoxia is primarily regulated by HIF-1α. Since miR-26a-5p is released from endothelial cells, its increased expression supports a relation between endothelial dysfunction and the development and progression of cancer within this group of COPD patients at increased LC risk.
The present study has some limitations. First, the present findings refer to circulating biomarkers, not to markers in lung tissue biopsies from patients, as they were not available. However, the main objective was to test the predictive value of these biological tools in non-invasive samples. Circulating miRNAs have been widely used as biomarkers due to their resistance to degradation and ubiquity. Nevertheless, further studies are needed to confirm the co-expression of the proposed miRNAs and their target mRNAs in addition to functional in vitro experiments for miRNA target regulation. Second, the number of subjects with COPD and LC included in the study may be considered small. However, this is a prospective study, where a large and very well-characterized cohort of COPD patients was included, with repeated biological samples and a long follow-up time until the development and diagnosis of LC. This fact helped obtain informative results, laying the foundations to increase the related knowledge of the potential use of miRNA-based approaches in lung cancers. Finally, the AUC values to inform the potential lung cancer risk was 0.7 for both miRNAs. This is not excessively high but could be potentiated by the addition of traditional screening tests such as chest CT or by other potentially early diagnostic markers yet to be defined. However, we do recognize that a validation of the present findings must be performed on another large prospective cohort of LC patients with an analysis of samples at several time points.
In conclusion, our study demonstrated that miR-1246 and miR-206 have potential value as non-invasive biomarkers of early lung cancer detection in COPD patients. A confirmational study on a larger prospective cohort is needed to test this signature, evaluating its use in screening and early lung cancer management.

Study Individuals
Ninety-nine individuals with a diagnosis of COPD recruited from the Hospital Universitario N/S de Candelaria, Tenerife, Spain, were included in this study. They are part of a cohort of 263 smokers, recruited and followed annually at this hospital as part of the BODE cohort [37], and a cohort of 3825 individuals from a lung cancer screening program (Pamplona-International early detection program, P-IELCAP) [38]. Of these patients, 13 individuals in Tenerife and 20 in the Pamplona cohort developed LC during follow-up and were enrolled in the study. Of the 33 patients with LC, 21 had two blood samples obtained: one at LC diagnosis and the other 3 years before. From the pool of COPD patients who did not develop LC over follow-up, 66 were age-and gender-matched to be used as controls. All subjects were followed annually for a mean of 72 months.
Inclusion criteria: age > 40 years, smoking history >15 pack-years, post-bronchodilator FEV1/FVC ratio < 0.70, and clinically stable for at least 6 weeks at the time of evaluation. Spirometry, lung volume, diffusion capacity for carbon monoxide, and exercise capacity were measured according to ATS-ERS guidelines [39,40]. Dyspnea was evaluated using the mMRC scale [41], the BODE index was calculated as previously described [42], and co-morbidities were quantified using the Charlson index [43]. A pulmonologist visually scored the baseline for emphysema presence using criteria established by the Fleischner Society [44]. All-cause mortality was recorded. Exclusion criteria included any other respiratory diseases and uncontrolled comorbidities such as malignancy at baseline. None of the patients had received any anti-tumor treatments at the time of sample collection.
The study was approved by the ethical committee board of Hospital Universitario N/S de Candelaria (PI 55/17, CHUNSC-2018-29). Written informed consent was obtained from all participants. The research was conducted in accordance with the Declaration of Helsinki.

Sample Collection
Serum samples (a total of 162) were collected from 33 COPD patients who developed LC during follow-up and 66 COPD controls without LC at the time of LC diagnosis. Of these, 21 patients had serum samples obtained 3 years before LC diagnosis that were matched with 30 controls with samples at the same timeline. The serum samples were separated from whole blood within 1 h after collection and stored at −80 • C until further use in the genetic study.

NGS miRNA Screening Assay
Sixteen serum samples corresponding to eight individuals (four COPD patients with LC and four COPD patients who did not develop LC during follow-up as controls) with biological samples in two timelines of analysis (at the time of LC diagnosis and 3 years before) were analyzed for the miRNA screening. RNA was isolated from 200 µL of serum using the miRNeasy Serum/Plasma Advanced Kit (Qiagen). The library preparation was carried out using the QIAseq miRNA Library Kit (QIAGEN). A total of 5µL total RNA was converted into miRNA NGS libraries. miRNAs expression screening was performed by next-generation sequencing (NGS) in an Illumina NextSeq500 platform (Exiqon A/S, Copenhagen, Denmark). The numbers of known miRNAs were calculated by counting data to relevant entries in miRBase v20 software (http://mirbase.org). The miRNA expression was expressed as Tags Per Million (TPM, the number of reads for a particular miRNA). miRNAs that were stably expressed across all samples were identified using NormFinder software [45].

Validation Assay: Quantitative RT-PCR for miRNA Expression
The top abundant miRNAs that resulted from the NGS screening step were validated by qRT-PCR in all study participants at two moments: at the time of LC diagnosis and 3 years before (a total of 162 serum samples). RNA was isolated from serum samples, and cDNA was synthesized by retro transcription using miRCURY LNA RT kit (Qiagen Inc., Hilden, Germany). To guarantee isolation control, the spike-ins UniSp2, UniSp4, and UniSp5 were used. In cDNA synthesis and amplification, the conjunctions of cel-miR-39-3p and UniSp6 were used as controls, as indicated by the manufacturers (miRCURY LNA RT Kit and RNA Spike-in Kit, Qiagen Inc., Hilden, Germany). The expression of the resulting significant miRNAs from the NGS screening study at the first stage was determined by an RT-PCR analysis using miRCURY LNA Sybr Green PCR Master Mix (Qiagen Inc., Hilden, Germany). A final volume of 10 µL per reaction contained 1X Sybr Green Master Mix, 200 nmol/L specific primer set (miRCURY LNA miRNA PCR assay, Qiagen Inc., Hilden, Germany), and 3 ng cDNA. All samples were performed in triplicate. The experiments were performed on a Step One Plus real-time PCR system (Applied Biosystem, Foster City, CA, USA) under the following conditions: 95 • C 15min, followed by 40 cycles of 94 • C for 15 s, 55 • C 30 s, and 72 • C 30 s. hsa-miR-486-5p and hsa-let7a-5p, which were the best candidates in the NGS screening analysis, were used as reference genes for normalization control. A non-template control was carried out in each experiment. The relative expression analysis of the target miRNAs was performed using the comparative threshold method 2−∆∆Ct [46].

Statistical Analysis
The sample was characterized using summary statistics: relative frequency of each category, 50th percentiles (5-95th) and non-scale normal, and mean ± SD, as appropriate. The comparisons between cases and controls were carried out using Pearson's chi-square test, Mann-Whitney's U, Wilcoxon, Fisher Exact, and Kruskal-Wallis. The correlations between variables were estimated using the Spearman or Pearson tests. miRNA expression analysis was performed at two time points: at LC diagnosis and 3 years before (median follow-up 36 months). Cases and controls were matched for age and gender.
The R statistical software package [49] and DeSeq2 software [50] were used to perform the miRNA differential expression analysis. TMM normalization of the samples was used, consisting of the trimmed mean of the M-values method based on log-fold and absolute gene-wise changes in the expression levels between samples. These fold change (FC) values were transformed to a Log scale for normalization. A miRNA was considered candidate for validation when Log2FC > 1.5. The Benjamini-Hochberg false discovery rate (FDR) algorithm was used to correct the p-values for multiple testing [50], where FDR is defined as the expected fraction of false rejections among those hypotheses rejected.
In the multivariate logistic regression analysis, the following covariates were included: smoking status (pack-years of smoking), BMI, and the presence of emphysema (visual score by CT scan). The sensitivity and specificity of each miRNA for the early identification of those patients who developed lung cancer were determined by receiver operator characteristic (ROC) curves. The area under the ROC curve (AUC) of each test was calculated for direct comparison. SPSS 25.0 (IBM Co) and R software were used for all statistical analyses and two-tailed p-values < 0.05 were considered significant.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The datasets supporting the conclusions of this article are included within the article and its additional files.